For this practice, we are going to explore a Pokemon data set from Kaggle that appeared in a previous year as a dataset of the day (https://www.kaggle.com/nayansolanki2411/world-of-pokemon).

1. Reading the data

  1. First read the data into a data frame called pokemon and use the janitor package to clean up the names.
pokemon <- read_csv("data/Pokemon.csv") %>% clean_names()
## Rows: 800 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Name, Type 1, Type 2
## dbl (9): #, Total, HP, Attack, Defense, Sp. Atk, Sp. Def, Speed, Generation
## lgl (1): Legendary
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names(pokemon)
##  [1] "number"     "name"       "type_1"     "type_2"     "total"     
##  [6] "hp"         "attack"     "defense"    "sp_atk"     "sp_def"    
## [11] "speed"      "generation" "legendary"

For the questions which follow, it will be conenient to also have Pokemon data frames consisting only of double (numeric) and only categorical (aside from name). The following commands show one way of doing this.

pokemon_dbl <- pokemon %>% select_if(is_double)
pokemon_fac <- pokemon %>% select(3:4)
glimpse(pokemon_dbl)
## Rows: 800
## Columns: 9
## $ number     <dbl> 1, 2, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 11, 12, 13, 14, …
## $ total      <dbl> 318, 405, 525, 625, 309, 405, 534, 634, 634, 314, 405, 530,…
## $ hp         <dbl> 45, 60, 80, 80, 39, 58, 78, 78, 78, 44, 59, 79, 79, 45, 50,…
## $ attack     <dbl> 49, 62, 82, 100, 52, 64, 84, 130, 104, 48, 63, 83, 103, 30,…
## $ defense    <dbl> 49, 63, 83, 123, 43, 58, 78, 111, 78, 65, 80, 100, 120, 35,…
## $ sp_atk     <dbl> 65, 80, 100, 122, 60, 80, 109, 130, 159, 50, 65, 85, 135, 2…
## $ sp_def     <dbl> 65, 80, 100, 120, 50, 65, 85, 85, 115, 64, 80, 105, 115, 20…
## $ speed      <dbl> 45, 60, 80, 80, 65, 80, 100, 100, 100, 43, 58, 78, 78, 45, …
## $ generation <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
glimpse(pokemon_fac)
## Rows: 800
## Columns: 2
## $ type_1 <chr> "Grass", "Grass", "Grass", "Grass", "Fire", "Fire", "Fire", "Fi…
## $ type_2 <chr> "Poison", "Poison", "Poison", "Poison", NA, NA, "Flying", "Drag…

2. Using loops

  1. Write a for loop that calculates the mean of each variable in pokemon_dbl.

  2. The function fivenum() calculates the five-number summary of a vector, consisting of the Min, Q1, Median, Q3, and Maximum where Q1 and Q3 are the 25th and 75th percentiles. How would you modify your loop above to store the five number summaries. Explore different approaches to storing the results.

  3. Write a for loop that converts each of the two categorical variables into factors. Your loop should work even if there were more than two columns. Use glimpse() to check that you modified the variables as required.

3. Using purrr

Repeat 1-3 using appropriate functions from the purrr package instead of loops. Which do you prefer and why?

4. Just for fun: A Quick Intro to plotly

I have installed the plotly package for interactive graphics in this project space. (Google it if you’d like to know more). I noticed someone had used it for a Pokemon analysis (https://www.kaggle.com/nayansolanki2411/world-of-pokemon) on Kaggle and thought you might like to get a taste. Below is a line of code for you to run. Explore the plot it produces. From what we’ve done with ggplot2 and elsewhere, you should be able to make good sense of the code. Play around with the code to see what other interesting plots you can come up with for the pokemon data. (Note that such interactive plots can be built into Shiny web apps for truly interactive data exploration.)

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(pokemon, x = ~sp_def, y = ~sp_atk,type = "scatter",
        mode = "markers", size = ~total, color = ~legendary, text = ~name)
## Warning: `line.width` does not currently support multiple values.

## Warning: `line.width` does not currently support multiple values.
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels